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Title: Scheduling the allocation of data fragments in a distributed 
database environment: A machine learning approach 

Author: Chaturvedi, Alok R. ; Choubey, Ashok K. ; Roan, Jinsheng 
Corporate Source: Purdue Univ, West Lafayette, IN, USA 

Source: IEEE Transactions on Engineering Management v 41 n 2 May 1994. p 
194-207 

Publication Year: 1994 

CODEN: IEEMA4 ISSN: 0018-9391 

Language: English 

Document Type: JA; (Journal Article) Treatment: G; (General Review); T; 
(Theoretical) 

Journal Announcement: 9409W5 

Abstract: Different database fragmentation and allocation strategies have 
been proposed to partially replicate data in a partitioned , distributed 
database (DDB) environment. The replication strategies include database 
snapshots, materialized views , and quasi-copies . These strategies are 
?static' and do not adapt to the changes in the data usage patterns. 
Furthermore, they often require expensive update synchronizations to 
maintain data consistency and do not exploit the knowledge embedded in the 
query history. This paper describes a machine learning based time invariant 
fragmentation method (MLTIF) that acquires knowledge about the data usage 
patterns for each node . Based on this knowledge, MLTIF designs time 
invariant fragments (TIF) and schedules its allocation and selective update 
for a specified time period. Simulation is used to compare the 
effectiveness of the MLTIF approach with that of full replication, 
materialized views , and non replication strategies. Initial results 
indicate that for most normal operating conditions, the MLTIF approach can 
be effective. (Author abstract) 21 Refs. 
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; Learning systems; Scheduling; Data processing; Large scale systems; 
Reliability 

Identifiers: Data fragment allocation; Machine learning based time 
invariant fragmentation method; Time invariant fragmentation 
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The research question in this thesis concerns how to parallelize the 
spatial range and join query processing in order to support a high 
performance spatial database application. Data partitioning for the range 
query operation involves declustering of spatial data, while data 
partitioning for the spatial join involves clustering of spatial data. 
If the static partitioning methods fail to equally distribute the load 
among different processors , the load-balance may be improved by 
redistributing parts of the data to idle processors using Dynamic 
Load-Balancing (DLB) techniques. 

In this thesis, we provide a framework for declustering collections of 
extended spatial objects by identifying the following key issues: (i) the 
work-load metric, (ii) the spatial-extent of the work-load, (iii) the 
distribution of the work-load over the spatial-extent, and (iv) the 
declustering method. We identify and experimentally evaluate alternatives 
for each of these issues. 

In addition, we also provide a framework for dynamically balancing the 
load between different processors . We experimentally evaluate the 
proposed declustering and load-balancing methods on a distributed memory 
MIMD machine (Cray T3D) and shared-memory machine (SGI Challenge) . 
Experimental results show that the spatial-extent and the work-load metric 
are important issues in developing a declustering method. Experiments also 
show that the replication of data is usually needed to facilitate dynamic 
load-balancing, as the cost of local processing is often less than the cost 
of data transfer for extended spatial objects. In addition, we also show 
that the effectiveness of dynamic load-balancing techniques can be improved 
by using declustering methods to determine the subsets of spatial objects 
to be transferred during run-time. 

A spatial join is often performed in two steps: a filter step and a 
refinement step. In this thesis, we focus on the refinement step of the 
spatial join. The refinement step of the spatial join takes as input a 
sequence of pairs of tuples and checks each tuple to see if the join 
predicate is satisfied for that tuple. This is similar to the join index 
processing done in traditional relational databases. We develop min-cut 
graph partitioning based methods for join processing using a join index. 
We use min-cut graph partitioning as a new heuristic for solving the page 
access sequence problem for fixed size buffer in sequential systems. We 
show that the number of page accesses needed to compute a join using join 
index in a fixed buffer environment is bounded by the sum of sizes of the 
base relations and the size of the cut-set of the page connectivity 
graph. Since the min-cut graph partitioning aims to minimize the size of 
the cut-set, this proposed heuristic is a direct method. Experiments with 
benchmark data sets show that the graph- partitioning based heuristic 
outperforms the existing methods, particularly when join selectivity is 
high and buffer space is small. (Abstract shortened by UMI . ) 
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In this dissertation, an array processor connected in an 
n-dimensional mesh is evaluated for very large relational database 
applications. Each node is assumed to have substantial processing 
capability with substantial amount of memory. By partitioning each base 

relation among all nodes , and by storing them directly within primary 
memory of a memory-resident database machine, the bottleneck of slow 
secondary storage access is eliminated. Whenever inter- node operations 
are required, local partitions are routed to other nodes . 

In this dissertation, an n-dimensional mesh is studied for SIMD 
database processing. Database algorithms are classified into five 
categories based on their routing requirements: (1) database algorithms 
which do not require any routing such as select; (2) database algorithms 
that require partial routing without order such as tuple balancing; (3) 
database algorithms that require partial routing with order such as 
project, elimination of duplicates, and sort; (4) database algorithms that 
require full routing such as nested-loop, sort-merge join algorithms, and 
cartesian product; (5) database algorithms that require random routing such 
as hash-based join and hash-based aggregate algorithms. Routing algorithms 
appropriate for these different classes of database algorithms are 
presented, and their performances are analyzed. Their performances on an 
n-dimensional mesh are compared with their performances on a binary cube, 
which is a subset of the n-dimensional mesh. Incremental expansion is 
studied on an n-dimensional mesh and its impact on performance is measured. 

The design of a large parallel computer is often limited by the 
longest interconnection. In an n-dimensional mesh, the method of 
interconnection, individual interconnection length and node architecture 
remain identical as the computer is expanded. A custom node architecture 
maximizes processing and communication parallelism. Another node 
architecture uses standard microprocessor components. 

Several n-dimensional meshes are connected through serial 
communication links in the (n + l)th dimension to support even larger 
database applications. Each n-dimensional mesh may execute a different 
database operation or query resulting in an MIMD database computer. Two 
query execution strategies are presented and their performance are studied 
using simulation models. 
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Journal: International Journal of Electronic Commerce vol.4, no. 3 
p. 45-67 
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Publication Date: Spring 2000 Country of Publication: USA 
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Material Identity Number: G303-2000-002 
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Language: English Document Type: Journal Paper (JP) 
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Abstract: This paper describes a strategy for actively replicating and 
updating content for electronic commerce. Active replication and updating 
of content is achieved by intelligent agents (lA) using a time-invariant 
fragmentation approach to partitioning and replicating data in a 
distributed computing environment. Taking into account the time-sensitivity 
property of date, lAs derive time-invariant fragments for their respective 

nodes . From the query history. A time-invariant fragment (TIF) is that 
portion of the database whose values do not change during a specified time 
interval. The algorithm that lAs use in creating TIFs for each node , for 
a given time interval, is presented. The active replication approach is 
compared with three other approaches, full-replication, nonreplication, and 

materialized view , in terms of data transmission costs. Results 

indicate that the active approach can be most effective for electronic 
commerce because of the high percentage of modification queries, the large 
size of the network, and the great number of transactions. {21 Refs) 
Subfile: C 

Descriptors: concurrency control; electronic commerce; replicated 
databases; software agents 
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intelligent agents; distributed computing environment; replicating data; 
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Publisher: Springer-Verlag, Berlin, Germany 
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Conference Title: Database and Expert Systems Applications. 7th 
International Conference, DEXA '96 Proceedings 

Conference Date: 9-13 Sept. 1996 Conference Location: Zurich, 
Switzerland 

Language: English Document Type: Conference Paper (PA) 
Treatment: Theoretical (T) 

Abstract: Presents a data partitioning technique for shared-nothing 
database systems. A unique feature of our scheme is that it organizes a 
multicomputer system into groups of even numbers of processing nodes ; 
and each relation is assigned to one of these groups in such a way to 
minimize contention among concurrent queries. Thus, a fixed degree of 
declustering is used for all base relations in this scheme. Our 

simulation results demonstrate that this approach provides significantly 
better performance than conventional methods which independently determine 
a degree of declustering for each of the base relations . These schemes 
totally ignore the requirement of inter-query parallelism. Obviously, an 
appropriate degree of declustering represents a good trade-off between 
inter-query and intra-query parallelism for our strategy. To investigate 
this issue, we perform extensive simulations to study the effect of various 
system and workload parameters on the optimality of the degree of 
declustering. We found that it is influenced primarily by the parallel 
processing overhead. With this finding, we develop a mathematical model to 
determine the optimal degree of declustering for a given system. (10 Refs) 
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Abstract: Presents a new approach to parallel computation of transitive 
closure queries using a semantic data fragmentation. Tuples of a large 

base relation denote edges in a graph, which models a transportation 

network. A fragmentation algorithm is proposed which produces a 

partitioning of the base relation into several fragments such that 
any fragment corresponds to a subgraph. One fragment, called high-speed 
fragment, collects all edges which guarantee maximum speed. Thus, the 
fragmentation algorithm induces a hierarchical relationship between the 
high-speed fragment and all other fragments. With this fragmentation, any 
query about paths connecting two nodes can be answered by using just the 
fragments in which nodes are located and the high-speed fragment. In 
general, if each fragment is managed by a distinguished processor , then 
the query can be answered by three processors working in parallel. This 
schema can be applied recursively to generate an arbitrary number of 
hierarchical levels. (15 Refs) 
Subfile: C 
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Since data warehousing has become a major field of research there has 
been a lot of interest in the selection of materialized views for query 
optimization. The problem is to find the set of materialized views 
which yields the highest cost savings for a given set of queries under a 
certain space constraint. The analytical perspective results in queries 
which on the one hand require aggregations but on the other hand are quite 
restrictive with regard to the fact data. Usually there are "hot spots", 
i.e. regions which are requested very frequently, like the current period 
or the most important product group . However, most algorithms in 
literature do not consider restrictions of queries and therefore generate 
only views containing all summary data at a certain aggregation level 
although the space it occupies could better be used for other, more 
beneficial views. This article presents an algorithm for the selection of 
restricted views . The cost savings using this algorithm have been 
experimentally evaluated to be up to 80% by supplying only 5% additional 
space . 

English Descriptors: Query; Database; Multidimensional system; Grain size 
distribution; Aggregate; Grid pattern; Node ; Algorithm; Data warehouse 

French Descriptors: Question documentaire; Base donnee; Systeme n 
dimensions; Granularite; Agregat; Maillage; Noeud; Algorithme; Vue 
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Computer- implemented method of maintaining complex grouping expression 
tables of transactions , involves building modification data stream for 
materialized view maintained with table, based on base table 
modifications 
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US 20030093407 Al 25 G06F-007/00 Provisional application US 99135277 

Abstract (Basic) : US 20030093407 Al 

NOVELTY - A data stream comprising modifications to be propagated 
to a materialized view maintained with complex grouping 
expressions, is built after performing a modification to a base table 
in a transaction. The built data stream is applied to the materialized 
view , 

DETAILED DESCRIPTION - INDEPENDENT CLAIMS are also included for the 
following : 

(1) computer program product for maintaining complex grouping 
expression tables of transactions; and 

(2) computer-implemented system for maintaining complex grouping 
expression tables of transactions. 

USE - For maintaining complex grouping expression tables of 
transactions . 

ADVANTAGE - Avoids full re-computation of queries for updating the 
materialized views (i.e. automatic summary tables) of the database 
tables, by building a data stream comprising table modifications for 
the materialized view and applying the data stream to the view. 

DESCRIPTION OF DRAWING (S) - The figure shows the hardware 
configuration of the table maintenance system. 

pp; 25 DwgNo 1/9 
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Abstract (Basic) : US 20030093407 Al 

NOVELTY - A data stream comprising modifications to be propagated 
to a materialized view maintained with complex grouping 
expressions, is built after performing a modification to a base table 
in a transaction. The built data stream is applied to the materialized 
view . 

DETAILED DESCRIPTION - INDEPENDENT CLAIMS are also included for the 
following : 

(1) computer program product for maintaining complex grouping 
expression tables of transactions; and 

(2) computer-implemented system for maintaining complex grouping 
expression tables of transactions. 

USE - For maintaining complex grouping expression tables of 
transactions . 

ADVANTAGE - Avoids full re-computation of queries for updating the 
materialized views (i.e. automatic summary tables) of the database 
tables, by building a data stream comprising table modifications for 
the materialized view and applying the data stream to the view. 

DESCRIPTION OF DRAWING (S) - The figure shows the hardware 
configuration of the table maintenance system. 
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Abstract (Basic) : KR 2002046786 A 

NOVELTY - A method for preparing a query using a materialized 
view and a dimension layer in a data warehouse is provided to enhance 
a performance of a data warehouse system by returning the same result 
as a query created by a user and automatically creating new query 
capable of being processed more effectively using many materialized 
views for being processed by replacing the original query. 

DETAILED DESCRIPTION - In a preparation method of a query created 
by a user using new query through a dimension layer and many 
materialized views existed in a data warehouse at a data warehouse 
system storing large amount of data, a normalized form is defined with 
respect to the query and the materialized views using a group 
grid being induced from the dimension layers (SlOO) . It is checked 
whether each materialized view may be used in the preparation of 
the query(SllO) . Materialized views are selected to be used in the 
preparation of the query(S120). A query block is created with respect 
to each selected materialized view (S130) . New query is created by 
integrating the created query blocks (S170 ) . 
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Abstract (Basic): US 6345272 Bl 

NOVELTY - The method involves receiving an aggregate query that 
places a restriction on an ordered dimension. The restriction is 
specified at a first level of granularity for the dimension. The 
aggregate query does not reference a materialized view that groups 
results at a second level of granularity of the ordered dimension. The 
second level of granularity is coarser than the first. 

DETAILED DESCRIPTION - It is determined whether the materialized 
view satisfies each condition of a first set of conditions. If the 
materialized view satisfies each condition, then the query is 
rewritten to produce a query that references the materialized view 
and includes a rewritten restriction. 

INDEPENDENT CLAIMS are included for a computer-readable medium and 



for a database system. 

USE - For rewriting queries to access a materialized view . 

ADVANTAGE - Rewrite mechanism is capable of rewriting queries to 
access materialized views that would otherwise not be rewritten. 
Does not depend on summary tables. 

DESCRIPTION OF DRAWING (S) - The figure shows a dimension hierarchy. 
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Query optimization method for computer system, involves rewriting query 
using materialized views having partitioning or replication 
properties different from properties specified in reference tables of 
query 
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Abstract (Basic) : US 6339769 Bl 

NOVELTY - Existence of multiple materialized views having 
partitioning or replication properties different from the properties 
specified in the reference tables in a query, is determined after 
accepting the query. Query is rewritten using the materialized views 
after analyzing a portion of the query using the materialized views 
. Rewritten query is executed using the materialized views . 

DETAILED DESCRIPTION - INDEPENDENT CLAIMS are also included for the 
following : 

(a) Query optimization apparatus; 

(b) a computer program. 

USE - For optimization of queries by transparently altering 
properties of relational tables using materialized views in a 
database management system. 

ADVANTAGE - Optimizes queries using materialized views that are 
replicated and partitioned across multiple processors, and optimizes 
RDBMS software using replicated and partitioned copies of 
materialized views . 

DESCRIPTION OF DRAWING (S) - The figure shows the flowchart 
explaining the query optimization method, 
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Abstract (Basic) : WO 9809238 A 

The method involves using materialised views to compute answers 
to SQL queries with grouping and aggregation. The materialised 
view is semantically analysed to determine whether the materialised 
view can be used in evaluating an input query. 

If the view is usable, the input query is rewritten to produce an 
output query that is multi-set equivalent to the input query, and that 
specifies one or more occurrences of the materialised view as a 
source of information to be returned by the output query. The output 
query is then evaluated. 

USE - Using materialised view to compute and evaluate SQL 
queries with grouping and aggregation in query optimisation, data 
warehousing, very large transaction recording systems and mobile 
computing. 
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51 1067781 CLUSTER? ^ GROUP? OR PARTITION? OR CATEGC 

52 167 MATERIAL? 0 VIEW? OR (STORE? OR SAVE? OR CACHE?) (N)QUER? OR 

(AUXILIAR? OR BASE) () RELATION? 

53 18 SI AND S2 

54 7 S3 AND (NODE? OR LOCATION? OR PROCESSOR?) 
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Method for preparing query using materialized view and dimension 

layer in data warehouse 
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Abstract (Basic): KR 2002046786 A 

NOVELTY - A method for preparing a query using a materialized 
view and a dimension layer in a data warehouse is provided to enhance 
a performance of a data warehouse system by returning the same result 
as a query created by a user and automatically creating new query 
capable of being processed more effectively using many materialized 
views for being processed by replacing the original query. 

DETAILED DESCRIPTION - In a preparation method of a query created 
by a user using new query through a dimension layer and many 
materialized views existed in a data warehouse at a data warehouse 
system storing large amount of data, a normalized form is defined with 
respect to the query and the materialized views using a group 
grid being induced from the dimension layers ( SlOO ) . It is checked 
whether each materialized view may be used in the preparation of 
the query (SllO) , Materialized views are selected to be used in the 
preparation of the query (S120). A query block is created with respect 
to each selected materialized view (S130) . New query is created by 
integrating the created query blocks ( S170 ) . 
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Abstract (Basic) : WO 9809238 A 

The method involves using materialised views to compute answers 
to SQL queries with grouping and aggregation. The materialised 
view is semantically analysed to determine whether the materialised 
view can be used in evaluating an input query. 

If the view is usable, the input query is rewritten to produce an 
output query that is multi-set equivalent to the input query, and that 
specifies one or more occurrences of the materialised view as a 
source of information to be returned by the output query. The output 
query is then evaluated. 

USE - Using materialised view to compute and evaluate SQL 
queries with grouping and aggregation in query optimisation, data 
warehousing, very large transaction recording systems and mobile 
computing . 
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